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Summary 


For  k  >2  independent  normal  populations  with  unknown  means  and  a  common  known 
variance,  the  problem  of  selecting  the  population  with  the  largest  mean  and  simultaneously 
estimating  the  mean  of  the  selected  population  is  considered  in  the  decision  theoretic 
approach  following  Cohen  and  Sackrowitz  (1988).  Under  several  loss  functions  with  two 
additive  components  due  to  selection  and  due  to  estimation,  Bayes  decision  rules  are 
derived  and  studied.  Both,  the  case  of  equal  sample  sizes  and  the  case  of  unequal  sample 
sizes  are  treated.  The  “natural”  rule,  which  selects  in  terms  of  the  largest  sample  mean 
and  then  estimates  with  the  sample  mean  of  the  selected  population,  is  critically  examined 
in  all  situations  considered. 
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1.  Introduction 


Let  7Tj , . . . ,  be  k  >  2  given  normal  populations  with  unknown  means  6\ , . . . ,  0*  €  R, 
and  a  common  known  variance  a 2  >  0.  Suppose  we  want  to  find  the  population  with  the 
largest  mean  and  simultaneously  estimate  the  mean  of  the  selected  population;  here  the 
observed  data  are  k  independent  samples  of  sizes  iii, . . .  ,n*  from  tti,  . . . ,  7r*  with  sample 
means  Xi , . . . ,  A*,  respectively. 

All  results  in  the  vast  literature  on  ranking  and  selection  axe  restricted  to  one  of  the 
two  decision  problems,  except  one.  Cohen  and  Sackrowitz  (1988)  have  presented  a  decision 
theoretic  framework  for  the  combined  decision  problem  and  derived  results  for  the  case  of 
k  =  2  and  n*  —  n2. 

Selecting  the  population  with  the  largest  sample  mean  A^j,  say,  is  usually  called 
the  natural  selection  rule,  since  it  is  the  uniformly  best  permutation  invariant  selection 
procedure  for  a  general  class  of  loss  functions  if  the  sample  sizes  ni, . . . ,  n*  are  all  equal. 
However,  for  unequal  sample  sizes,  the  natural  selection  rule  loses  much  of  its  quality.  In 
fact,  under  0-1  loss,  it  can  perform  “worse  than  at  random  u  0i, . . . , 0*  axe  close  together. 
This  is  studied  in  detail  in  Gupta  and  Miescke  (1988). 

Estimating  the  mean  of  the  selected  population  has  been  considered  only  under  the  as¬ 
sumption  that  the  natural  selection  rule  is  employed.  Knowing  that  the  natural  estimator 
Xi  for  $i,  in  case  of  being  selected,  i  =  1, . . .  ,fc,  overestimates  0^j  =  max{0j, . . .  ,0*}, 
and  thus  overestimates  even  more  so  the  mean  of  the  selected  population,  alternative 
estimators  have  been  studied  for  the  present  and  for  other  experimental  models  by  the  fol¬ 
lowing  authors:  Sarkadi  (1967),  Dahiya  (1974),  Cohen  and  Sackrowitz  (1982),  Sackrowitz 
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and  Samuel-Cahn  (1984,  1986),  Jeyaratnam  and  Panchapakesan  (1984,  1986,  1988),  Vel- 
laisamy  and  Sharma  (1988,  1989),  Vellaisamy,  Kumar  and  Sharma  (1988),  and  Venter 
(1988). 

Rather  than  “estimating  after  selection” ,  the  decision  theoretic  treatment  of  the  com¬ 
bined  selection-estimation  problem  leads  to  “selecting  after  estimation”,  as  it  has  been 
pointed  out  by  Cohen  and  Sackrowitz  (1988).  So  fax,  however,  only  a  few  limited  situa¬ 
tions  have  been  considered.  The  purpose  of  this  study  is  to  extend  known  results  in  several 
directions.  First,  the  case  of  k  >  2  populations  needs  to  be  considered  since  in  selection 
problems  alone,  typical  features  and  difficulties  do  not  appear  before  k  is  at  least  equal 
to  three.  Second,  for  the  two  additive  components  of  the  loss  function  due  to  selection 
and  due  to  estimation,  alternatives  to  0-1  loss  and  squared  error  loss,  respectively,  have 
to  be  examined.  Zero-One  loss  for  selection  has  the  undesired  effect  of  a  stiff  penalty  for 
selecting  a  non-best  population  even  if  its  mean  is  close  to  the  largest  mean,  and  absolute 
error  loss  is  a  reasonable  alternative  to  squared  error  loss  for  estimation.  Third,  the  case  of 
unequal  sample  sizes  ni, . . .  ,  n*  has  to  be  considered.  If  selection  alone  is  under  concern, 
better  rules  than  the  natural  selection  procedure  have  been  derived  in  Gupta  and  Miescke 
(1988)  for  this  purpose,  all  of  which  take  into  account  the  precisions  with  which  the  sam¬ 
ple  means  Xi,. . .  ,Xk  represent  the  unknown  means  #i,...,0*.  Thus,  in  the  combined 
selection-estimation  problem,  where  additionally  the  precisions  of  the  estimates  depend  on 
the  respective  sample  sizes,  non-standard  decision  rules  are  to  be  expected.  This  will  be 
discussed  in  Section  4,  after  a  general  framework  has  been  introduced  in  Section  2,  and 
the  case  of  equal  sample  sizes  has  been  treated  in  Section  3. 
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2.  General  Framework 


Let  X_  =  (.Yi , . . . ,  Xk )  be  a  random  vector  of  observations  which  has  a  density  (prob- 
£ 

ability  function)  ft  fi(xi\9i),x  6  R*(Z*),$  £  Qk  C  Rfc,  with  respect  to  the  Lebesgue 
«=  1 

measure  on  R*  (counting  measure  on  Zk).  Of  course  X_  may  be  already  a  collection  of  k 
sufficient  statistics  for  9\ , . . . ,  9k- 

The  goal  is  to  select  the  population,  i.e.  coordinate,  which  is  associated  with  0[jt]  = 
max{f?i, . . . ,  0*},  and  to  simultaneously  estimate  the  9- value  of  the  selected  population. 
Since  Bayes  rules  are  the  main  topic  of  this  paper,  only  nonrandomized  decision  rules  need 
to  be  considered  which  are  represented  as  follows. 

(1)  i(x)  =  (s(x),es{£i{x)),  x  6  Rfe, 

where  s(x)  €  {1, 2, . . . ,  k}  is  the  selection  rule,  and  £;(x)  (E  f2,  i  =  1, . . . ,  k,  is  a  collection 
of  k  estimates  for  i  —  1, . . . ,  k,  respectively,  available  at  selection. 

The  loss  function  is  assumed  to  be  addive, 

(2)  L(9,d)  =  A(9,s)  +  B(9a,ea), 

where  A  represents  the  loss  of  selecting  population  tt,  at  9,  and  B  the  loss  of  estimating 
9 a  by  £a.  The  following  examples  will  be  considered. 

(3)  -4o(£,<s)  =  cl{e[k])(9a)-, 

A^&s)  =  c(0[fc]  -  9a)-  Bi(9a,ea)  =  1 9a  -  £.|; 

A2(9,s)  =  c(9[k]  -  9 a )2 ;  B2(9a,ea)  =  (9a  -  iaf . 

All  combinations  of  A's  and  B's  are  reasonable  in  one  way  or  another,  and  c  >  0  gives 
relative  weights  to  the  two  types  of  losses.  However,  it  seems  that  the  most  appealing  and 
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realistic  combinations  are 


(4)  Ci (8,d)  =  c(9[k j  -  9a)  +  \6»  -  ta\,  and 

C2(6,d)  =  c(6[k]-6s)2+(0a-ea)2. 

In  the  Bayes  approach,  let  the  vector  of  the  k  unknown  parameters  be  random  and 
denoted  by  Under  a  prior  distribution  of  it,  the  posterior  risk  at  X  =  i  can  be 
represented  as  follows. 

(5)  r{d(x)\x)  =  rvi(s(£)|4)  +  rB(s(x),  i9(x)(x)]x), 

where  r\4(.s(x)|x)  =  £{A(£,  -s(x))|2£  =  x},  and  rB(s(x),  fs(£),  (x)|x)  =  £{-B(0,(i), 

As  it  has  been  mentioned  in  the  Introduction,  the  decision  theoretic  treatment  of  the 
combined  selection-estimation  problem  leads  to  “selecting  after  estimation”.  This  can  be 
seen  now  from  the  following  fact  which  is  a  straightforward  extension  of  the  main  result 
in  Cohen  and  Sackrowitz  (1988). 

Lemma  1.  Let  l*(x)  minimize  r»fi.l,-(x)|x).  i  =  1  Furthermore,  let  s*fxl  minimize 

t'a('S(x)|x)  +  rB(s(x),£*^j(x)|x).  Then  the  Baves  decision  rule,  at  X_  =  x,  is  d*(x)  = 
(s*U),^.(i)(x)). 

It  should  be  pointed  out  that  £*(x)  is  the  usual  Bayes  estimate  of  9i,i  =  1, . . . ,  k,  if 
estimation  alone  is  under  concern.  There  is  no  bias  reduction  involved  which  has  been  the 
main  concern  in  papers  dealing  with  estimation  after  selection  mentioned  in  the  Introduc¬ 
tion. 
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Under  certain  circumstances,  the  problems  of  selection  and  estimation  can  be  com¬ 
pletely  separated.  More  precisely,  the  following  holds. 

Corollary  1.  Whenever  at  some  X  =  x.  rg(i,  ^*(x)|x)  does  not  depend  on 

i  €  {1,2,...,  k},  s*(x)  minimizes  r/4(s(r)Jr). 

Let  us  consider  briefly  the  selection  problem  by  its  own,  i.e.  assume  that  loss  function  B 

in  (2)  is  zero.  Then  the  natural  selection  rule  sN(x),  which  selects  in  terms  of  the  largest  x,-, 

k 

is  known  to  have  strong  optimality  properties.  If  the  density  of  X_  is  of  the  form  re  f(x,  \8t ), 

1=1 

where  /  has  monotone  likelihood  ratios,  and  if  loss  function  A  in  (2)  is  permutation 
invariant  and  favors  selection  of  larger  values,  then  s'v  is  the  best  permutation  invariant 
selection  rule,  uniformly  in  8,  i.e.  it  is  Bayes  selection  rule  for  every  permutation  symmetric 
prior.  This  and  further  results  can  be  found  in  Gupta  and  Miescke  (1984). 

In  combination  with  estimation  of  the  parameter  of  the  selected  population,  however, 
it  can  occur  that  in  the  above  situation,  where  sN  is  uniformly  best  invariant  selection 
rule,  sN  is  not  part  of  the  Bayes  rule  d*  =  ),  i.e.  s*  is  different  from  sN .  More 

precisely,  this  happens  when  the  assumption  of  Corollary  1  are  not  met.  The  following 
example  illustrates  this  fact. 

Example  1.  Let  Xi  ~  N(9i,l),i  =  1  be  independent.  Assume  that  a  priori, 

0i, ...  ,0*  are  a  random  sample  from  an  exponential  distribution  with  density  exp(— 8), 
0  >  0.  Finally,  let  L(0,d )  =  A(8,s)  +  (8S  —  £s)2,  where  A  is  permutation  invariant  and 
favors  selection  of  larger  9- values. 

A  posterior,  at  X  =  x.  0iT . . . ,  0t  are  independent,  and  the  posterior  density  of  0,  is 
<i p(8 ,  —  yi)/$(yi),8i  >  0,  where  y,  =  x,  —  1,  i  =  1, . . . ,  k,  and  where  <p  and  $  denote  the 
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density  and  c.d.f.  of  N (0, 1). 

Standard  calculations  lead  to  the  following  results  for  i  =  1, . . . ,  k. 

(6)  £*(x)  =  E{Qt\X  =  x}  =  y,  +  v(yi)/$(yi),  and 

=  Var  {0,|X  =  x}  =  1  +  2y\  +  yi<p(yi)/$(yi)  -  b(y>)/$(y>)]2- 

Thus,  although  sN(x)  minimizes  r^(s(x)|x),  .s^  is  not  equal  to  s* ,  since  rg(i,  £*(x)|x) 
depends  on  i  €  {1, . . . ,  k}  except  for  a  Lebesgue  null  set. 

At  the  end  of  this  section,  let  us  briefly  justify  the  choice  of  a  Bayes  approach  to  the 
given  problem  by  pointing  out  that  the  classical  (frequentist)  approach  does  not  offer  a 
direct  analytical  solution.  The  risk  function  for  a  decision  rule  d  at  parameter  point  9  6  R* 
is  given  by 

(7)  R(ti)  =  El{A(t,s(ZL)) ) 

For  one  fixed  given  selection  rule  s,  the  second  term  can  be  optimized,  at  least  approxi¬ 
mately,  in  many  circumstances.  This  has  been  done  in  the  previous  papers  deeding  with 
estimation  of  the  parameter  of  the  selected  population,  where  s  =  sN  has  been  assumed. 
However,  to  optimize  R(0,d),  one  has  to  consider  at  least  some  class  of  possible  selection 
rules  for  s,  which  appears  to  be  not  feasible.  Bayes  rules,  on  the  other  hand,  can  be  found 
in  a  constructive  way  as  it  is  shown  in  Lemma  1. 

3.  Independent  Normal  Populations  With  Equal  Sample  Sizes 

Let  Xu, . . .  ,Xin  be  a  sample  from  N(6i,<r2),i  =  1  where  a2  >  0  is  known,  and 
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let  all  samples  be  independent.  Let  A,  =  n  1  £  A ij,i  =  1.....A-,  be  the  sample  means. 

>= i 

which  axe  sufficient  for  9. 

Assume  that  a  priori,  0j, . . . ,  0*  is  a  random  sample  from  N(fi,q),  where  /i  £  R  and 
q  >  0  are  known.  This  conjugate  prior  will  prove  to  be  useful  in  several  aspects,  as  it  has 
done  so  previously  in  Sackrowitz  and  Samuel-Cahn  (1984),  Cohen  and  Sackrowitz  (1988). 
and  Gupta  and  Miescke  (1988).  Aposteriori,  given  A  =  x,  0,  ~  jV  *  = 

1, . . . ,  k,  are  independent,  where  p  =  cr2 /n.  And  marginally,  Ax, ... ,  A*  is  a  sample  from 
N(p,q  +  p). 

For  a  slightly  more  general  class  of  priors,  the  following  result  can  be  shown  to  hold. 

Theorem  1.  For  the  loss  function  L  =  A  +  B  in  (2),  assume  that  A  is  permutation 
symmetric  and  favors  selection  of  larger  values.  and  that  B  is  either  B j  or  B2  in  (3). 
Then  for  every  exchangeable  normal  prior,  the  Baves  rule  J*  =  (s*,  i*. )  satisfies  s *  =  s‘v 
and  e-(x)  =  FJjG.IA^  =  x},  i  =  1, . . . ,  k. 

Proof:  Apriori,  let  Q  ~  iV(/il,  al  +  bl  1T),  where  a  >  0,  a  +  kb  >  0,  l.r  =  (1,1 _ 1), 

and  I  is  the  identity  matrix.  Then  aposteriori,  g’ven  A  =  x,0.  ~  A(£*(x),ct/  -I-  01  lr), 
where  a  =  ap/(a  +  p),  0  =  bp2 /[(p  +  a  +  kb)(p  +  a)],  and 

(8)  f  (x)  =  £{£|A  =  x} 

=  [al  +  01  lT][(a  +  kb)'1  pl  +  p-'x], 

since  £{0i|2£  =  x}  minimizes  re(i, ^«(x)jx)  under  both  B  —  B 1  and  B  =  B2.  The 
minimum  values  are,  respectively, 

(9)  rfll(i,£*(x)|£)  =  [2(a  +  0)/tt]1/2,  and 
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rBj(*’C(£)UO  =  01  +  0, 


which  do  not  depend  on  i  £  {1,2,...,  A:}  and  x.  The  latter  fact  will  be  utilized  later  in 
this  section. 

Thus,  the  assumption  of  Corollary  1  is  fulfilled  at  every  i  6  Rfc,  and  from  the  discussion 
following  Corollary  1  it  follows  that  sN(x)  minimizes  r^(s(x)|x)  at  every  j  £  R*\  i.e. 
s*  =  sN .  This  completes  the  proof  of  the  theorem. 

In  the  remainder  of  this  section  let  us  consider  the  natural  decision  procedure  d*  = 
(•S/V,^v),  which  employs  the  estimates  (x)  =  x,,i  =  1,...,A\  Although  from  the 
frequentist  point  of  view,  it  has  the  undesirable  features  of  overestimating  the  largest 
mean  and  thus  even  more  so  the  selected  mean,  dN  is  generalized  Bayes  rule  for  the 
noninformative  prior,  i.e.  the  Lebesgue  measure  on  Rk .  The  i.i.d.  normal  prior  considered 
at  the  beginning  of  this  section  can  be  used  for  further  examinations  of  dN ,  since  for  q 
tending  to  infinity,  the  posterior  distributions  tend  to  the  formal  posterior  distribution 
associated  with  the  noninformative  prior. 

As  mentioned  in  the  Introduction,  typical  features  and  difficulties  in  selection  problems 
do  not  appear  before  k  is  at  least  equal  to  three.  The  following  result  may  be  considered 
as  an  illustration  of  this  fact. 

Theorem  2.  For  the  loss  function  L  =  A  -f  B  is  (2),  assume  that  A  =  .40,  and  that  B 
is  either  B ,  or  in  ('3b  Then  the  following  holds.  The  rule  dN  =  (sN,e?N)  is  minimax  if 
and  only  if  k  =  2. 

Proof:  Obviously,  €  R*.  Cohen  and  Sackrowitz  (1982)  have  shown 

that  for  every  loss  function  L  with  L( 0)  =  0,T(s)  =  L(—z),  and  L(|z|)  increasing  in 
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\z\,  E$(L(X[k]  ~  @sN( A')))  is  maximized  at  9  =  0.  Thus,  the  maximum  of  Ee(  |A”[fc]  — 
&3N(Xj\m)  is  found  to  be  equal  to  p(a\  +  b\)  for  m  =  2,  and  equal  to  pl^2ck  for  m  —  1, 
where  a\  =  Var  ( JV)*.] ), bk  =  E(N^\),  and  ck  =  £(|N[*]|),  and  A7)*]  is  the  maximum  of  a 
sample  of  size  k  from  a  standard  normal  distribution.  The  first  fact  has  been  shown  in 
Sackrowitz  and  Samuel-Cahn  (1986),  and  the  second  follows  in  a  similar  way. 

The  maximum  of  Ei(Ao(8,  sN(X_)))  occurs  also  at  8  =  0,  and  it  is  equal  to  c(l  —  1/A: ), 
which  has  been  shown  in  Gupta  and  Miescke  (1988).  Thus, 


j  c{l  -  l/k)  +  p1/2ck,  i (  B  =  Bx 

S.  V  \  c(l-l/k)  +  p(a2  +b2),  if  B  =  B2 


At  this  point  it  is  convenient  to  consider  the  following  randomized  rule  d°,  say,  which 
uses  £-v(x)  =  x,, i  =  and  selects,  without  any  consideration  of  the  data,  each 


population  with  the  same  probability  l/k.  Obviously,  for  all  8, 


R{8 ,d°)  = 


c(l-l/A:)  +  (2p/7r)1/2,  if  B  =  B, 
c(l-l/fc)  +  p,  if  B  =  B2 


which  provide  upper  bounds  to  the  respective  minimax  values.  And  since  a\  +  b\  >  1  as 
well  as  cjt  >  (2/tt )x /2  for  k  >3,  dN  czmnot  be  minimax  rule  for  k  >  3. 

Finally,  to  see  that  dN  is  minimax  for  k  =  2,  one  realizes  that  (10)  and  (11)  are 
identical  in  this  case,  since  c2  =  (2/ zr)1/2 ,  and  a\  +  =  1.  It  suffices  to  find  a  sequence 

of  priors  whose  Bayes  risk  tend  to  (11),  because  d°  is  an  equalizer  rule.  This  sequence  is 
provided  by  the  conjugate  prior  considered  at  the  beginning  of  this  section,  by  letting  q 

vary,  q  =  1,2, _  From  (9),  with  a  =  q,a  =  qp/(q  +  p),  and  0  =  0,  it  follows  that  the 

part  of  the  Bayes  risk  due  to  estimation  tends,  for  large  q- values,  to  (2p/n)x^2  and  p  under 
B  =  B \  and  B  =  i?2,  respectively.  The  other  part  of  the  Bayes  risk  due  to  selection  tends 
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to  c(l  —  1/A:),  as  it  has  been  shown  by  Gupta  and  Miescke  (1988).  This  completes  the 
proof  of  the  theorem. 

Remark:  The  “if” -part  of  Theorem  2  is  also  valid  under  the  assumption  of  A  =  A\  or 
A  =  A2.  Although  the  parameter  configuration  9\  =  92  =  ...=■  9k  is  not  least  favorable  for 
-EV[A,(0,  s*  (2£))],  minimaxity  holds  because  the  risk  due  to  estimation  of  d'v  is  constant 
in  9  for  k  =  2.  For  k  >  3,  however,  the  question  of  minimaxity  remains  open  since  the 
two  risk  parts,  due  to  selection  and  due  to  estimation,  do  not  assume  their  maxima  at  a 
common  parameter  configuration. 

We  conclude  this  section  with  the  following. 

Theorem  3.  For  every  loss  function  (2),  with  components  A  and  B  taken  from  (3), 
the  rule  dN  =  (sN )  is  extended  Baves  rule. 

Proof:  Consider  the  same  sequence  of  priors  which  was  used  at  the  end  of  the  proof  of 
Theorem  2.  Under  B  =  B2,  the  posterior  risk  due  to  estimation  of  dN  is  given  by 

(12)  E{{Qas(^  -  x[fc])2|W  =  x} 

=  £{(©,*<*)  -  0(£)(x))2|2£  =  +  &(*)U)  -  *[*])2 

=  pq(p  +  9)-1  +  \p(p  +  9)-1]2(^[*]  -  p)2- 

Since,  marginally,  X\ , . . . ,  Xk  is  a  sample  from  N(fi,p+q),  the  Bayes  risk  due  to  estimation 
of  dN  turns  out  to  be 

(13)  Mp  +  q)'1  +p2(p  +  ?r1(al  +  &I), 

which  tends  to  p,  as  q  tends  to  infinity,  where  p  is  also  the  limit  of  the  Bayes  risk  due  to 
estimation. 
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Under  B  —  B\ ,  it  follows  now  immediately  that  the  corresponding  Bayes  risk  due  to 
estimation  of  satisfies 

(14)  £(|e.»(i)  (2p/7r)t/2, 

since,  by  the  facts  stated  above, 

(is)  £((<:»(*>- -W)  —  o. 

The  second  part  of  the  proof  deals  with  the  other  part  of  the  Bayes  risk  due  to 
selection.  Since  under  each  A  =  A,,  =  0,1,2,  sN  is  employed  by  the  Bayes  rule,  i.e. 
sN  =  s*,  it  remains  to  be  shown  that  in  all  three  cases  the  limits  of  Bayes  risks  are  finite. 
The  case  of  A  =  A0  has  been  treated  in  Gupta  and  Miescke  (1988),  where  it  is  shown  that 
the  Bayes  risk  due  to  selection  of  tends  to  c(l  —  1  /k)  as  q  tends  to  infinity. 

For  A  =  j42,  the  risk  due  to  selection  of  dN  satisfies  at  every  fixed  9  with,  say,  0*  =  0[*], 

(16)  £«l^(2,3f'(X))] 

=  =*[*]} 

1=1 

<  ^{Ok-OifPoiX^Xk} 

1=1 

=  2p*E1 

i=i 

k  —  1 

<2 p  S  Ai<p(A,)  <  2 p(k  —  l)u;, 

1=1 

where  A i  =  (^jt  -  9i)/(2p)1^2 ,i  =  l,...,fc  —  1,  and  w  =  <p(l).  The  first  inequality  is 
obvious,  the  second  follows  from  the  fact  that  A$(— A)  <  <p(A)  for  A  >  0,  and  the  third 
holds  since  the  maximum  of  A<p(A)  on  the  positive  real  line  occurs  at  A  =  1. 
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Thus,  it  follows  that  the  Bayes  risk  due  to  selection  of  dN  tends  to  a  finite  limit  as 
q  tends  to  infinity.  Finally,  applying  Schwarz’  inequality  to  (16),  the  same  is  seen  to  hold 
under  A  —  A\.  Actually,  in  this  case  one  can  verify  that  the  limit  is  zero.  This  completes 
the  proof  of  the  theorem. 


4.  Independent  Normal  Populations  With  Unequal  Sample  Sizes 

Let  .Xu, . . . , Xim  he  a  sample  from  N(0i,  a2 ),  i  =  1  ,...,£,  where  cr2  >  0  is  known, 

and  where  not  all  of  the  sample  sizes  ni,...,n*  are  equal.  The  k  samples  are  assumed 

to  be  independent.  As  before,  sufficiency  leads  to  considering  the  sample  means  Xi  — 
1  n* 

n{  £  Xij,  i  =  1, . . . ,  A:,  which  have  variances  p,  =  n~lu2  ,i  —  1, . . . ,  k,  respectively. 

;=i 

By  various  reasons  mentioned  before,  as  well  as  those  discussed  in  Gupta  and  Miescke 
(1988),  the  0-1  loss  for  selection,  i.e.  Ao,  will  not  be  considered  any  further.  Moreover, 
to  keep  the  analysis  in  reasonable  size,  we  restrict  ourselves  in  the  sequel  to  the  two  most 
appealing  and  realistic  loss  combinations  C\  and  £2,  given  by  (4).  It  should  be  pointed  out 
that  the  risk  function  of  every  decision  rule  d  =  (s,d 3)  is  continuous  in  8  under  both,  £1 
and  £2.  Continuity  of  the  risk  due  to  selection  under  loss  Ai  has  been  justified  in  Gupta 
and  Miescke  (1988),  and  the  same  arguments  apply  to  A2.  Continuity  of  the  risk  due  to 
estimation  under  loss  B\  and  B2  is  well  known.  Thus,  all  proper  Bayes  rules  derived  in 
the  sequel,  as  well  as  those  considered  before  with  any  loss  combination  from  (2)  and  (3), 
are  admissible. 

Since  the  analysis  of  Bayes  rules  sT  =  (s*,£*. )  under  loss  function  C\  is  easier  to 
manage,  let  us  deal  with  it  first.  The  risk  function  of  a  decision  rule  d  =  (s,£s)  at 
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parameter  point  0  is  given  by 


(17)  m,i)  =  cEg_(e(k]  -  e,w)  +  £»(|«,(x,  -  4(X)(V)|). 


In  the  present  situation  of  unequal  sample  sizes  n  1,...  ,  n*,  it  is  appropriate  to  consider, 
more  generally,  also  non-exchangeable  normal  priors,  0*  r X(p,,qt),i  =  In¬ 

dependence  of  ©i,...,0fc,  however,  will  be  kept  as  before.  Thus,  a  posteriori,  given 
X  =  r,0i,...,0jt  are  independent,  with  0*  ~  N  -gj2»  'j  =  I  k,  and 

marginally,  X\ ,Xk  are  independent  with  X,  N  (fit,  Pi  +q,),i  =  1 

By  Lemma  1,  the  Bayes  rule  employs  the  estimator  £*(r)  =  ( q,x,  +  ptp,)/(qi  +  pt)  for 
0%,  i  =  1, . . . ,  k,  and  it  remains  to  find  s*(r).  For  any  decision  rule  d  =  (s,  £*),  the  posterior 
risk  at  X  =  x  turns  out  to  be  the  following  for  selection  s(r)  =  i  €  {1, . . . ,  &}. 


(18) 


£{0[*]I2£  =  *}- 


q,x ,  +PiPi 

qi  +  Pi 


+ 


2  qiPi 

n  q,  +  Pi 


1/2 


Thus,  the  following  is  seen  to  hold. 


Theorem  4.  Under  loss  function  C\  and  the  normal  prior  considered  above,  the  Baves 
rule  d*  =  (s*,^. )  employs  £*(x)  =  (qiXi  +  PiPi)/(qi  +Pi),  i  =  1, .  •  • ,  k,  and  s*(z)  maximizes 
ct*(x)  -  [2qiPi/ir(qi  +  p,)}1/2,  i  = 

There  are  three  special  cases  which  deserve  to  be  studied  in  more  detail.  They  are  as 
follows. 


Case  1:  Noninformative  prior;  or  qi  — *  oo,  i  =  1, . . . ,  k.  In  this  case,  i*{x)  =  r,,z  = 
1  and  s*(i)  maximizes  Xi  —  c~l(2pi/ir)1^2 ,  i  =  1 ,k. 

Case  2:  Prior  variances  proportional  to  sample  variances;  i.e.  qi  =  7 pi,i  =  l,...,k,  for 
some  fixed  7  >  0.  In  this  case,  £*(z)  =  (jxi+Pi)/(7  +  l),  i  =  1, . . . ,  fc,  and  s*(r)  maximizes 
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t*(x)  -C  1(27p,/(7  +  1)tt)1/2,  i  =  1  Especially,  for  pi  =  . . .  =  p*  =  p,  say,  £*(x)  = 

(7Xj  +  /i)/(7  +  l),i  =  1, . . . ,  fc,  and  s*(x)  maximizes  x*  -c_1(2(7  +  l)p,/77r)1/,2,z  =  1, . . . ,  k. 

Case  3:  Posterior  is  decreasing  in  transposition  (DT);  i.e.  q~l  +  pt-1  =  r-1,  i  =  1, . . . ,  k, 
for  some  fixed  r  >  0.  Here,  the  sum  of  prior  precision  and  sampling  precision  is  constant 
across  the  k  populations.  Such  priors  have  been  considered  and  justified  in  Gupta  and 
Miescke  (1988).  The  idea  for  applications  is  the  following.  If  the  normal  priors  are  not 
exchangeable,  a  proper  choice  of  sample  sizes  ni , . . . ,  n*  in  the  planning  of  the  experiment 
can  lead,  at  least  approximately,  to  a  posterior  which  is  (DT).  This  is  highly  desirable 
since  in  that  situation  usually  quite  simple  Bayes  rule  are  found.  In  the  present  case, 
£*(x)  =  t(pT1xi  +  9t_Vi)i *  =  1,  •  • . ,  and  s*(x)  maximizes  £*(x),z  =  1, . . . ,  k.  Especially, 
for  pi  =  . . .  =  fik  =  p,  say,  €*(x)  =  p~1r(xi  —  p)  +  p,z  =  1  and  s*(x)  maximizes 

P,_1(^i  -p),*'  = 

The  decision  rule  considered  last  in  Case  3  is  of  a  very  simple  and  appealing  form. 
Due  to  the  (DT)-property  of  the  posterior,  it  can  be  seen  to  be  Bayes  rule  under  the  large 
class  of  loss  functions  assumed  in  Theorem  1.  Without  further  proof,  the  following  can  be 
stated. 

Corollary  2.  For  the  loss  function  L  =  A  +  B  ia  (2),  assume  that  A  is  permutation 
symmetric  and  favors  selection  of  larger  #- values,  and  that  B  is  either  Bi  o£  B2  in  (3). 
If  the  normal  prior  satisfies  q~l  +  p'1  =  r_1,z  =  1  for  some  r  >  0,  then  the  Baves 

mk  d*  =  (s*,0)  is  of  the  following  form.  ^*(^)  =  p~1r(xl  -  m)  +  p,-,i  =  1,. ..  ,fc,  and 
s*(x)  maximizes  ^*(i). 

There  is  one  interesting  feature  of  the  Bayes  rule  given  by  Theorem  4  which  is  worth 
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to  be  pointed  out  explicitly.  Whenever  for  some  i  £  {1, . . . ,  k},x,  turns  out  to  be  larger 
than  m,  then  a  smaller  (larger)  p,,  i.e.  a  larger  (smaller)  n,,  works  in  favor  of  (against) 
population  7T,  to  be  selected.  And  for  x,  <  p,,  the  reverse  is  seen  to  hold  true  for  the  rule 
derived  in  Case  3.  On  the  other  hand  the  rule  of  Case  1,  as  well  as  that  one  of  Case  2  for 
Hi  =  ■  •  ■  =  Hk,  have  the  property  that  at  any  x,,  a  smaller  (larger)  p,  works  in  favor  of 
(against)  n,  being  selected. 

It  is  also  interesting  to  note  that,  from  a  frequentist  point  of  view,  the  decision  rule  in 
Case  1,  which  may  be  considered  as  a  “natural  rule”  under  unequal  sample  sizes,  selects 
in  terms  of  lower  confidence  bounds  of  9i , . . . ,  Ok  at  a  common  fixed  confidence  level. 
Especially  for  the  values  c  =  0.485  and  c  =  0.343,  x,  -  c~1(2pt/n )^2  is  a  lower  confidence 
bound  for  with  95%  and  99%,  respectively,  level  of  confidence.  Similar  can  be  said  about 
the  rule  in  Case  2  for  pi  =  . . .  =  p*.  Finally,  the  following  can  be  shown. 

Theorem  5.  Under  the  loss  function  C\,  the  decision  rule  of  Case  1  is  extended  Baves 
rule. 

Proof:  This  can  be  shown  under  Case  2  with  Hi  —  0,  i  =  1, . . . ,  k,  by  letting  7  tend  to 
infinity.  The  Bayes  posterior  risk  of  the  Bayes  rule  at  X_  =  x,  in  view  of  (18),  is 

(19)  c£{0[fc]|£  =  i}  -  max  {07(7  +  l)_1Xi  -  [27Pi/7r(7  +  1)]1/2}. 

On  the  other  hand,  the  posterior  risk  of  the  rule  of  Case  1  is,  under  Case  2, 

(20)  c£{0[*]|£  =  i)  -  ^  max^{c-;(7  ^  -  (2p. /tt;  1  } 

+  £'{|0,-i,||i  =  x)-(2P,/»)1/2, 

where  s  =  s(^)  is  that  index  at  which  the  maximum  occurs.  The  difference  of  the  maxima  in 
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(19)  and  (20)  is  bounded  by  the  maximum  of  the  values  (2pi/7r)1/2  —  (27p,/7r(7  +  l))1/2,  i  = 
1  which  does  not  depend  on  x,  and  which  tends  to  zero  as  7  tends  to  infinity. 

Furthermore,  the  last  difference  in  (20)  is  bounded  by 

(21)  E  [£{10,  -  x,\\X  =  x)  -  (2PiM'/2]. 

j=  1 

Finally,  from  the  fact  that  for  every  j  =  1 

(22)  E{|0>  -Xj||X  =  x} 

=  bviKi  +  i)],/2£(|iv  +  (7(7  +  i)p,]-'/2x,|), 

where  N  ~  TV^O, 1)  is  an  auxiliary  random  variable,  and  the  fact  that  marginally,  [(7  + 
1  )Pi\-'nXi  N( 0, 1),  j  =  1, . . . ,  k ,  it  is  seen  that  the  integral  of  (21)  with  respect  to  the 
marginal  density  of  ( X\  ,...,Xk)  tends  to  zero  as  7  tends  to  infinity.  To  summarize,  it  has 
been  shown  that  the  integral  of  the  difference  of  (20)  and  (19)  with  respect  to  the  marginal 
density  of  (Xx , . . . ,  Xt)  tends  to  zero  as  7  tends  to  infinity. 

To  justify  the  relevance  of  this  result,  it  remains  to  be  shown  that  the  limit  of  the 
Bayes  risks  is  finite.  The  posterior  risk  of  the  Bayes  rule  can  be  written  in  the  form 

(23)  =  1}  -  7(7  +  l)-1x[fc]] 

+  min  {c7(7  +  l)-1(x[*]  -  Xi )  +  [2jpi/Tr(^  +  1)]1/2}. 

1=1,...,* 

It  is  now  easy  to  see  that  the  following  provides  an  upper  bound  to  (23), 

(24)  cE  |[7P./(7  +  l)]1/2iVj}^ 

+  [27M7  +  1)]1/2  .  max  {p!/2}  , 
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where  N\ , . . . ,  Nt  is  a  sample  from  1V(0, 1).  This  bound  does  not  depend  on  x,  and  it  tends 
to  a  finite  limit  as  7  tends  to  infinity.  Therefore,  the  proof  of  the  theorem  is  completed. 

To  conclude  this  section,  let  us  consider  how  Bayes  rules  a l*  =  (s*  ,i*3. )  look  like  under 
loss  function  Li-  As  mentioned  already  before,  the  analysis  is  more  complicated  than 
under  C\.  The  risk  function  of  a  decision  rule  d  =  (s,£,)  at  parameter  point  9  is  given  by 
the  following  counterpart  to  (17). 

(25)  R(0,d)=cEei[B[k]-9s{K)\2) 

+  Rei[^»(2L)  ~  4(2d(2Q]2)- 

Under  the  normal  prior  considered  before,  the  posterior  risk  at  J  =  1  for  any  decision 
d  =  (s,^*)  with  d*(x)  =  ( q{Xi  +  PiPi)/(qi  +Pi),i  =  1  which  is  the  estimate  employed 

by  the  Bayes  rule,  turns  out  to  be  the  following  for  selection  s(x )  =  i  6  {1, . . . ,  k}. 

(26)  c£{[0[*]  -  0i)2|r  =  £}  +  > 

qi  +  Pi 

which  has  to  be  minimized  by  .s*(r)  for  i  =  1  What  makes  this  task  difficult  is 

the  fact  that  for  any  i,  the  conditional  distribution  of  (©[it],  ©j)  at  X_  =  x  does  not  allow 
for  simpler  representations  of  the  conditional  expectation  in  (26),  which  in  most  situations 
has  to  be  evaluated  on  a  computer. 

At  the  end,  let  us  see  how  much  can  be  said  about  the  Bayes  rule  under  the  three 
crises  considered  previously. 

Case  1:  Noninformative  prior;  or  qi  — ►  00,  i  =  1, . . . ,  k.  In  this  case,  £*(x)  =  x,,i  = 
1, . . . ,  k,  and  s*(x)  minimizes 

(27)  c£([  max  {x}  -  Xj  +  p)/2Nj  -  p]/2 Ni}]2)  +  pt, 
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where  N 1, . .  ■ ,  iV*  is  a  random  sample  from  N( 0, 1). 

Case  2:  Prior  variances  proportional  to  sample  variances;  i.e.  q ,  =  7 pi,i  =  1  In 

this  case,  as  before  under  £j,  we  have  £*(x)  =  (71,-  -f  p,)/( 7  +  l),t  =  1,. ..  ,fc,  but  s*(x) 
minimizes  now  (26)  with  qiPi/(qi  +  Pi)  =  7Pi/(l  +  l),i  =  1, . . . ,  k. 

Case  3:  Prior  is  decreasing  in  transposition  (DT);  i.e.  q~:  +  p,-1  =  r-1, 1  =  1, ....  A:,  for 
some  r  >  0.  This  case  is  covered  by  Corollary  2,  and  thus  the  Bayes  rule  is  the  same  as 
that  one  in  Case  3  under  C\. 
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